Search CORE

8 research outputs found

Exploring query execution strategies for JIT vectorization and SIMD

Author: Boncz P.A. (Peter)
Gubner T.K. (Tim)
Publication venue
Publication date: 01/09/2017
Field of study

This paper partially explores the design space for efficient query processors on future hardware that is rich in SIMD capabilities. It departs from two well-known approaches: (1) interpreted block-at-a-time execution (a.k.a. "vectorization") and (2) "data-centric" JIT compilation, as in the HyPer system. We argue that in between these two design points in terms of granularity of execution and uni

CWI's Institutional Repository

Charting the design space of query execution using VOILA

Author: Boncz P.A. (Peter)
Gubner T.K. (Tim)
Publication venue: 'VLDB Endowment'
Publication date: 16/08/2021
Field of study

Database architecture, while having been studied for four decades now, has delivered only a few designs with well-understood properties. These few are followed by most actual systems. Acquiring more knowledge about the design space is a very time-consuming processes that requires manually crafting prototypes with a low chance of generating material insight.We propose a framework that aims to accelerat

CWI's Institutional Repository

Highlighting the performance diversity of analytical queries using VOILA

Author: Boncz P.A. (Peter)
Gubner T.K. (Tim)
Publication venue
Publication date: 16/08/2021
Field of study

Hardware architecture has long influenced software architecture, and notably so in analytical database systems. Currently, we see a new trend emerging: A "tectonic shift" away from X86-based platforms. Little is (yet) known on how this shift affects database system performance and, consequently, should influence the design choices made. In this paper, we investigate the performance characteristics of X86, POWER, ARM and RISC-V hardware on micro- as well as macro-benchmarks on a variety of analytical database engine designs. Our tool to do so is VOILA: a new database engine generator framework that from a single specification can generate hundreds of different database architecture engines (called "flavors"), among which well-known design points such as vectorized and data-centric execution. We found that performance on different queries by different flavors varies significantly, with no single best flavor overall, and per query different flavors winning, depending on the hardware. We think this "performance diversity" motivates a redesign of existing – inflexible – engines towards hardware- and query-adaptive ones. Additionally, we found that modern ARM platforms can beat X86 in terms of overall performance by up to 2×, provide up to 11.6× lower cost per instance, and up to 4.4× lower cost per query run. This is an early indication that the best days of X86 are over

CWI's Institutional Repository

Efficient query processing with Optimistically Compressed Hash Tables & Strings in the USSR

Author: Boncz P.A. (Peter)
Gubner T.K. (Tim)
Leis V. (Viktor)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/04/2020
Field of study

Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory footprint is often determined by how hash tables and the tuples within them are represented. In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently-accessed and infrequently-accessed value slices. By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Self-aligned Region (USSR) accelerates handling frequently-occurring strings, which are very common in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure. We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2-4× and improves performance by up to 1.5×. On a real-world BI workload, we measured a 2× improvement in performance and in micro-benchmarks we observed speedups of up to 25×

Crossref

CWI's Institutional Repository

Optimistically compressed Hash Tables & Strings in the USSR

Author: Boncz P.A. (Peter)
Gubner T.K. (Tim)
Leis V. (Viktor)
Publication venue
Publication date: 31/05/2021
Field of study

Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory footprint is often determined by how hash tables and the tuples within them are represented. In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently- and infrequently-accessed value slices. By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Self-aligned Region (USSR) accelerates handling frequently occurring strings, which are widespread in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure. We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2–4x and improves performance by up to 1.5x. On a real-world BI workload, we measured a 2x improvement in performance and in micro-benchmarks we observed speedups of up to 25x

CWI's Institutional Repository

Optimizing group-by and aggregation using GPU-CPU co-processing

Author: Boncz P.A. (Peter)
Gomes Tomé D. (Diego)
Gubner T.K. (Tim)
Raasveldt M. (Mark)
Rozenberg E. (Eyal)
Publication venue
Publication date: 27/08/2018
Field of study

While GPU query processing is a well-studied area, real adoption is limited in practice as typically GPU execution is only significantly faster than CPU execution if the data resides in GPU memory, which limits scalability to small data scenarios where performance tends to be less critical. Another problem is that not all query code (e.g. UDFs) will realistically be able to run on GPUs. We therefore investigate CPU-GPU co-processing, where both the CPU and GPU are involved in evaluating the query in scenarios where the data does not fit in the GPU memory.As we wish to deeply explore opportunities for optimizing execution speed, we narrow our focus further to a specific well-studied OLAP scenario, amenable to such co-processing, in the form of the TPC-H benchmark Query 1.For this query, and at large scale factors, we are able to improve performance significantly over the state-of-the-art for GPU implementations; we present competitive performance of a GPU versus a state-of-the-art multi-core CPU baseline a novelty for data exceeding GPU memory size; and finally, we show that co-processing does provide significant additional speedup over any of the processors individually.We achieve this performance improvement by utilizing parallelism-friendly compression to alleviate the PCIe transfer bottleneck, query-compilation-like fusion of the processing operations, and a simple yet effective scheduling mechanism. We hope that some of these features can inspire future work on GPU-focused and heterogeneous analytic DBMSes.</p

CWI's Institutional Repository

VOILA

Author: Gubner T.K. (Tim)
Publication venue
Publication date: 23/02/2021
Field of study

CWI's Institutional Repository

Designing an adaptive VM that combines vectorized and JIT execution on heterogeneous hardware

Author: Boncz P.A. (Peter)
Gubner T.K. (Tim)
Publication venue
Publication date: 01/01/2018
Field of study

CWI's Institutional Repository